• Home
  • Introduction
  • Data Source
  • Data Visualization
  • Exploratory Data Analysis
  • ARMA/ARIMA/SARIMA Model
  • ARIMAX Model
  • Financial Time Series Model
  • Deep Learning for TS
  • Conclusion

Impact of Macroeconomic Factors on General Electric Stock Price

One of the top companies in the industrial sector is General Electric (GE), which is a multinational conglomerate that produces a wide range of products including aircraft engines, power generation equipment, renewable energy solutions, and medical imaging devices. To analyze the stock price behavior of General Electric, an ARIMAX+ARCH/GARCH model can be employed.

By using an ARIMAX+ARCH/GARCH model to analyze the stock price behavior of General Electric, we can gain insights into how macroeconomic factors impact the company’s performance. For example, an increase in government spending on infrastructure projects or a rise in demand for commercial aircraft may lead to increased revenue and a rise in General Electric’s stock price. On the other hand, global economic uncertainty or changes in trade policies may lead to a decline in General Electric’s stock price.

Time series Plot

  • General Electric
  • Differentitaed Chart Series
  • GDP Growth
  • Inflation
  • Interest
  • Unemployment
Code
# get data
options("getSymbols.warning4.0"=FALSE)
options("getSymbols.yahoo.warning"=FALSE)


data.info = getSymbols("GE",src='yahoo', from = '2010-01-01',to = "2023-03-01",auto.assign = FALSE)
data = getSymbols("GE",src='yahoo', from = '2010-01-01',to = "2023-03-01")
df <- data.frame(Date=index(GE),coredata(GE))

# create Bollinger Bands
bbands <- BBands(GE[,c("GE.High","GE.Low","GE.Close")])

# join and subset data
df <- subset(cbind(df, data.frame(bbands[,1:3])), Date >= "2010-01-01")


# colors column for increasing and decreasing
for (i in 1:length(df[,1])) {
  if (df$GE.Close[i] >= df$GE.Open[i]) {
      df$direction[i] = 'Increasing'
  } else {
      df$direction[i] = 'Decreasing'
  }
}

i <- list(line = list(color = '#FDD835'))
d <- list(line = list(color = '#7F7F7F'))

# plot candlestick chart

fig <- df %>% plot_ly(x = ~Date, type="candlestick",
          open = ~GE.Open, close = ~GE.Close,
          high = ~GE.High, low = ~GE.Low, name = "GE",
          increasing = i, decreasing = d) 
fig <- fig %>% add_lines(x = ~Date, y = ~up , name = "B Bands",
            line = list(color = '#ccc', width = 0.5),
            legendgroup = "Bollinger Bands",
            hoverinfo = "none", inherit = F) 
fig <- fig %>% add_lines(x = ~Date, y = ~dn, name = "B Bands",
            line = list(color = '#ccc', width = 0.5),
            legendgroup = "Bollinger Bands", inherit = F,
            showlegend = FALSE, hoverinfo = "none") 
fig <- fig %>% add_lines(x = ~Date, y = ~mavg, name = "Mv Avg",
            line = list(color = '#C052B3', width = 0.5),
            hoverinfo = "none", inherit = F) 
fig <- fig %>% layout(yaxis = list(title = "Price"))

# plot volume bar chart
fig2 <- df 
fig2 <- fig2 %>% plot_ly(x=~Date, y=~GE.Volume, type='bar', name = "GE Volume",
          color = ~direction, colors = c('#FDD835','#7F7F7F')) 
fig2 <- fig2 %>% layout(yaxis = list(title = "Volume"))

# create rangeselector buttons
rs <- list(visible = TRUE, x = 0.5, y = -0.055,
           xanchor = 'center', yref = 'paper',
           font = list(size = 9),
           buttons = list(
             list(count=1,
                  label='RESET',
                  step='all'),
             list(count=3,
                  label='3 YR',
                  step='year',
                  stepmode='backward'),
             list(count=1,
                  label='1 YR',
                  step='year',
                  stepmode='backward'),
             list(count=1,
                  label='1 MO',
                  step='month',
                  stepmode='backward')
           ))

# subplot with shared x axis
fig <- subplot(fig, fig2, heights = c(0.7,0.2), nrows=2,
             shareX = TRUE, titleY = TRUE)
fig <- fig %>% layout(title = paste("General Electric  Stock Price: January 2010 - March 2023"),
         xaxis = list(rangeselector = rs),
         legend = list(orientation = 'h', x = 0.5, y = 1,
                       xanchor = 'center', yref = 'paper',
                       font = list(size = 10),
                       bgcolor = 'transparent'))

fig
Code
log(data.info$`GE.Adjusted`) %>% diff() %>% chartSeries(theme=chartTheme('white'),up.col='#FDD835')

Code
#import the data
gdp <- read.csv("DATA/RAW DATA/gdp-growth.csv")

#change date format
gdp$Date <- as.Date(gdp$DATE , "%m/%d/%Y")

#drop DATE column
gdp <- subset(gdp, select = -c(1))

#export the cleaned data
gdp_clean <- gdp
write.csv(gdp_clean, "DATA/CLEANED DATA/gdp_clean_data.csv", row.names=FALSE)

#plot gdp growth rate 
fig <- plot_ly(gdp, x = ~Date, y = ~value, type = 'scatter', mode = 'lines',line = list(color = 'rgb(240, 128, 128)'))
fig <- fig %>% layout(title = "U.S GPD Growth Rate: 2010 - 2022",xaxis = list(title = "Time"),yaxis = list(title ="GDP Growth Rate"))
fig
Code
#import the data
inflation_rate <- read.csv("DATA/RAW DATA/inflation-rate.csv")

#cleaning the data
#remove unwanted columns
inflation_rate_clean <- subset(inflation_rate, select = -c(1,HALF1,HALF2))

#convert the data to time series data
inflation_data_ts <- ts(as.vector(t(as.matrix(inflation_rate_clean))), start=c(2010,1), end=c(2023,2), frequency=12)

#export the data
write.csv(inflation_rate_clean, "DATA/CLEANED DATA/inflation_rate_clean_data.csv", row.names=FALSE)


#plot inflation rate 
fig <- autoplot(inflation_data_ts, ylab = "Inflation Rate", color="#FFA07A")+ggtitle("U.S Inflation Rate: January 2010 - February 2023")+theme_bw()
ggplotly(fig)
Code
#import the data
interest_data <- read.csv("DATA/RAW DATA/interest-rate.csv")

#change date format
interest_data$Date <- as.Date(interest_data$Date , "%m/%d/%Y")

#export the cleaned data
interest_clean_data <- interest_data
write.csv(interest_clean_data, "DATA/CLEANED DATA/interest_rate_clean_data.csv", row.names=FALSE)

#plot interest rate 
fig <- plot_ly(interest_data, x = ~Date, y = ~value, type = 'scatter', mode = 'lines',line = list(color='rgb(219, 112, 147)'))
fig <- fig %>% layout(title = "U.S Interest Rate: January 2010 - March 2023",xaxis = list(title = "Time"),yaxis = list(title ="Interest Rate"))
fig
Code
#import the data
unemployment_rate <- read.csv("DATA/RAW DATA/unemployment-rate.csv")

#change date format
unemployment_rate$Date <- as.Date(unemployment_rate$Date , "%m/%d/%Y")

# export the data
write.csv(unemployment_rate, "DATA/CLEANED DATA/unemployment_rate_clean_data.csv", row.names=FALSE)

#plot unemployment rate 
fig <- plot_ly(unemployment_rate, x = ~Date, y = ~Value, type = 'scatter', mode = 'lines',line = list(color = 'rgb(189, 183, 107)'))
fig <- fig %>% layout(title = "U.S Unemployment Rate: January 2010 - March 2023",xaxis = list(title = "Time"),yaxis = list(title ="Unemployment Rate"))
fig

From 2010 to 2014, GE experienced a steady increase in its business operations due to strong financial performance and consistent dividend payouts. However, from 2014 to 2016, the company faced challenges such as slowing sales growth and increased competition in the consumer goods industry, resulting in a decline in its business operations.

In response, GE began to streamline its operations and focus on core brands, which helped the company to recover from 2016 to 2018. This trend continued into 2019, with GE reaching an all-time high in mid-2019. However, the COVID-19 pandemic caused a brief dip in the company’s business operations in 2020, and GE’s stock price experienced some volatility in early 2021 due to global economic uncertainty and fluctuations in consumer demand for the company’s products.

Since early 2021, General Electric ’s stock price has experienced some volatility, likely due to a combination of factors such as global economic uncertainty and fluctuations in consumer demand for the company’s products.

As discussed before, the macroeconomic factors of GDP growth, inflation, interest rates, and unemployment rate are closely interrelated and play a crucial role in the overall health and stability of an economy. From 2010 to 2023, the global economy experienced a mix of GE and downs, with periods of strong GDP growth followed by slowdowns and recessions.

The second plot shows the first difference of the logarithm of the adjusted General Electric stock price. Taking the first difference removes any long-term trends and transforms the time series into a stationary process. From the plot, we can observe that the first difference of the logarithm of the General Electric stock price appears to be stationary, as the mean and variance are roughly constant over time.

Enodogenous and Exogenous Variables

  • Plot
  • Correlation Heatmap
  • CCF GDP
  • CCF Interest
  • CCF Inflation
  • CCF Unemployment
Code
numeric_data <- c("GE.Adjusted","gdp", "interest", "inflation", "unemployment")
numeric_data <- final[, numeric_data]
normalized_data_numeric <- scale(numeric_data)
normalized_data <- ts(normalized_data_numeric, start = c(2010, 1), end = c(2021,10),frequency = 4)
ts_plot(normalized_data,
        title = "Normalized Time Series Data for GE Stock and Macroeconomic Variables",
        Ytitle = "Normalized Values",
        Xtitle = "Year")
Code
# Get upper triangle of the correlation matrix
get_upper_tri <- function(cormat){
    cormat[lower.tri(cormat)]<- NA
    return(cormat)
}
cormat <- round(cor(normalized_data_numeric),2)

upper_tri <- get_upper_tri(cormat)

melted_cormat <- melt(upper_tri, na.rm = TRUE)
# Create a ggheatmap
ggheatmap <- ggplot(melted_cormat, aes(Var2, Var1, fill = value))+
 geom_tile(color = "white")+
 scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
   midpoint = 0, limit = c(-1,1), space = "Lab", 
    name="Pearson\nCorrelation") +
  theme_minimal()+ # minimal theme
 theme(axis.text.x = element_text(angle = 45, vjust = 1, 
    size = 12, hjust = 1))+
 coord_fixed()

ggheatmap + 
geom_text(aes(Var2, Var1, label = value), color = "black", size = 4) +
theme(
  axis.title.x = element_blank(),
  axis.title.y = element_blank(),
  panel.grid.major = element_blank(),
  panel.border = element_blank(),
  panel.background = element_blank(),
  axis.ticks = element_blank(),
  legend.justification = c(1, 0),
  legend.position = c(0.6, 0.7),
  legend.direction = "horizontal")+
  guides(fill = guide_colorbar(barwidth = 7, barheight = 1,
                title.position = "top", title.hjust = 0.5))

Code
par(mfrow=c(1,1))
ccf_result <- ccf(normalized_data[, c("GE.Adjusted")], normalized_data[, c("gdp")], 
    lag.max = 300,
    main = "Cros-Correlation Plot for GE Stock Price and GDP Growth Rate ",
    ylab = "CCF")

Code
cat("The sum of cross correlation function is", sum(abs(ccf_result$acf)))
The sum of cross correlation function is 5.53155
Code
par(mfrow=c(1,1))
ccf_result <- ccf(normalized_data[, c("GE.Adjusted")], normalized_data[, c("interest")], 
    lag.max = 300,
    main = "Cros-Correlation Plot for GE Stock Price and Interest Rate",
    ylab = "CCF")

Code
cat("The sum of cross correlation function is", sum(abs(ccf_result$acf)))
The sum of cross correlation function is 12.52537
Code
par(mfrow=c(1,1))
ccf_result <- ccf(normalized_data[, c("GE.Adjusted")], normalized_data[, c("inflation")], 
    lag.max = 300,
    main = "Cros-Correlation Plot for GE Stock Price and Inflation Rate",
    ylab = "CCF")

Code
cat("The sum of cross correlation function is", sum(abs(ccf_result$acf)))
The sum of cross correlation function is 16.47864
Code
par(mfrow=c(1,1))
ccf_result <- ccf(normalized_data[, c("GE.Adjusted")], normalized_data[, c("unemployment")], 
    lag.max = 300,
    main = "Cros-Correlation Plot for GE Stock Priceand Unemployment Rate",
    ylab = "CCF")

Code
cat("The sum of cross correlation function is", sum(abs(ccf_result$acf)))
The sum of cross correlation function is 19.28415

The Normalized Time Series Data for Stock Price and Macroeconomic Variables plot shows the same variables as the first plot but has been normalized to a common range of 0 to 1 using the scale() function in R, which standardizes the variables to have a mean of 0 and a standard deviation of 1. The heatmap analysis of the normalized data reveals that inflation and unemployment rate exhibit strong positive correlations with the stock price indices, indicating that these variables may significantly influence stock price movements. On the other hand, weaker correlations were observed between the stock price indices and GDP and interest rates, suggesting that these variables may have less impact on stock price fluctuations. The cross-correlation feature plots confirm these findings, indicating that inflation and unemployment rate are more suitable feature variables for the ARIMAX model when predicting General Electric movements.

Final Exogenous variables: Macroeconomic indicators: Inflation rate and unemployment rate.

Enodogenous and Exogenous Variables Plot

  • Plot
  • Check the stationarity
Code
final_data <- final %>%dplyr::select( Date,GE.Adjusted, inflation,unemployment)
numeric_data <- c("GE.Adjusted", "inflation","unemployment")
numeric_data <- final_data[, numeric_data]
normalized_data_numeric <- scale(numeric_data)
normalized_numeric_df <- data.frame(normalized_data_numeric)
normalized_data_ts <- ts(normalized_data_numeric, start = c(2010, 1), frequency = 4)

autoplot(normalized_data_ts, facets=TRUE) +
  xlab("Year") + ylab("") +
  ggtitle("General Electric  Stock Price, Inflation Rate and Unemployment Rate in USA 2010-2023")

Code
# Convert your multivariate time series data to a matrix
final_data_ts_multivariate <- as.matrix(normalized_data_ts)

# Check for stationarity using Phillips-Perron test
phillips_perron_test <- ur.pp(final_data_ts_multivariate)  
summary(phillips_perron_test)

################################## 
# Phillips-Perron Unit Root Test # 
################################## 

Test regression with intercept 


Call:
lm(formula = y ~ y.l1)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.3707 -0.1831 -0.0486  0.1363  4.4969 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.0006863  0.0412911  -0.017    0.987    
y.l1         0.8584440  0.0417094  20.582   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5141 on 153 degrees of freedom
Multiple R-squared:  0.7347,    Adjusted R-squared:  0.7329 
F-statistic: 423.6 on 1 and 153 DF,  p-value: < 2.2e-16


Value of test-statistic, type: Z-alpha  is: -22.1435 

         aux. Z statistics
Z-tau-mu           -0.0164

The results of the Phillips-Perron unit root test indicate strong evidence against the null hypothesis of a unit root, as the p-value for the coefficient of the lagged variable is less than the significance level of 0.05. This suggests that the variable y, which is being tested for stationarity, is likely stationary. Furthermore, the test statistic Z-tau-mu is 0.0164, which is smaller than the critical value of Z-alpha (-22.1435), providing further evidence of stationarity.

To determine whether the linear model requires an ARCH model, an ARCH test is conducted. The ACF and PACF plots are also used to identify suitable model values.

Model Fitting

  • Plot
  • ARCH Test
  • ACF Plot
  • PACF Plot
Code
normalized_numeric_df$GE.Adjusted<-ts(normalized_numeric_df$GE.Adjusted,star=decimal_date(as.Date("2010-01-01",format = "%Y-%m-%d")),frequency = 4)
normalized_numeric_df$inflation<-ts(normalized_numeric_df$inflation,star=decimal_date(as.Date("2010-01-01",format = "%Y-%m-%d")),frequency = 4)
normalized_numeric_df$unemployment<-ts(normalized_numeric_df$unemployment,star=decimal_date(as.Date("2010-01-01",format = "%Y-%m-%d")),frequency = 4)

fit <- lm(GE.Adjusted ~ inflation+unemployment, data=normalized_numeric_df)
fit.res<-ts(residuals(fit),star=decimal_date(as.Date("2010-01-01",format = "%Y-%m-%d")),frequency = 4)
############## Then look at the residuals ############
returns <- fit.res  %>% diff()
autoplot(returns)+ggtitle("Linear Model Returns")

Code
byd.archTest <- ArchTest(fit.res, lags = 1, demean = TRUE)
byd.archTest

    ARCH LM-test; Null hypothesis: no ARCH effects

data:  fit.res
Chi-squared = 25.401, df = 1, p-value = 4.657e-07
Code
ggAcf(returns) +ggtitle("ACF for returns")

Code
ggPacf(returns) +ggtitle("PACF for returns")

The ARCH LM-test was conducted with the null hypothesis of no ARCH effects. The test resulted in a chi-squared value of 25.401 with one degree of freedom, and a very low p-value of 4.657e-07. This suggests strong evidence against the null hypothesis, indicating the presence of ARCH effects in the data.

Based on the ACF and PACF plots, it appears that there is some significant autocorrelation and partial autocorrelation at multiple lags, which suggests that an ARIMA model may not be sufficient to capture the time series behavior. Additionally, the values for p and q appear to be relatively high, with p = 0 and q = 9 being suggested by the plots.

ARIMAX Model

  • Auto Arima Model
  • Auto Arima Residuals
  • ARIMA (0,1,9) Plot
  • ARIMA (0,1,9) Model
  • Cross Validation
Code
xreg <- cbind(Inflation = normalized_data_ts[, "inflation"],
              Unemployment = normalized_data_ts[, "unemployment"])
fit.auto <- auto.arima(normalized_data_ts[, "GE.Adjusted"], xreg = xreg)
summary(fit.auto)
Series: normalized_data_ts[, "GE.Adjusted"] 
Regression with ARIMA(0,1,0) errors 

Coefficients:
      Inflation  Unemployment
         0.0262       -0.1293
s.e.     0.1538        0.0698

sigma^2 = 0.1252:  log likelihood = -18.35
AIC=42.71   AICc=43.22   BIC=48.5

Training set error measures:
                      ME     RMSE       MAE       MPE     MAPE      MASE
Training set -0.01726622 0.343433 0.2723993 -30.29885 65.14651 0.4320897
                   ACF1
Training set 0.09578302
Code
checkresiduals(fit.auto)


    Ljung-Box test

data:  Residuals from Regression with ARIMA(0,1,0) errors
Q* = 4.2215, df = 8, p-value = 0.8366

Model df: 0.   Total lags used: 8
Code
set.seed(1234)

model_output <- capture.output(sarima(fit.res,0,1,9)) 

Code
cat(model_output[124:160], model_output[length(model_output)], sep = "\n")

Call:
arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
    xreg = constant, transform.pars = trans, fixed = fixed, optim.control = list(trace = trc, 
        REPORT = 1, reltol = tol))

Coefficients:
         ma1      ma2     ma3     ma4      ma5      ma6     ma7      ma8
      0.0971  -0.0973  0.0659  0.2270  -0.2299  -0.1028  0.2258  -0.3203
s.e.  0.1525   0.1693  0.1704  0.1934   0.1897   0.1696  0.1777   0.1800
          ma9  constant
      -0.8656   -0.0174
s.e.   0.1923    0.0208

sigma^2 estimated as 0.08399:  log likelihood = -16.28,  aic = 54.56

$degrees_of_freedom
[1] 41

$ttable
         Estimate     SE t.value p.value
ma1        0.0971 0.1525  0.6369  0.5277
ma2       -0.0973 0.1693 -0.5748  0.5686
ma3        0.0659 0.1704  0.3868  0.7009
ma4        0.2270 0.1934  1.1737  0.2473
ma5       -0.2299 0.1897 -1.2118  0.2325
ma6       -0.1028 0.1696 -0.6058  0.5480
ma7        0.2258 0.1777  1.2709  0.2109
ma8       -0.3203 0.1800 -1.7802  0.0825
ma9       -0.8656 0.1923 -4.5006  0.0001
constant  -0.0174 0.0208 -0.8327  0.4098

$AIC
[1] 1.06981

$AICc
[1] 1.177653
Code
n=length(fit.res)
k= 51
 
 
rmse1 <- matrix(NA, (n-k),4)
rmse2 <- matrix(NA, (n-k),4)
rmse3 <- matrix(NA, (n-k),4)


st <- tsp(fit.res)[1]+(k-5)/4 

for(i in 1:(n-k))
{
  xtrain <- window(fit.res, end=st + i/4)
  xtest <- window(fit.res, start=st + (i+1)/4, end=st + (i+4)/4)
  
  #ARIMA(0,1,0) ARIMA(1,1,1)
  
  fit <- Arima(xtrain, order=c(0,1,0),
                include.drift=TRUE, method="ML")
  fcast <- forecast(fit, h=4)
  
  fit2 <- Arima(xtrain, order=c(0,1,9),
                include.drift=TRUE, method="ML")
  fcast2 <- forecast(fit2, h=4)


  rmse1[i,1:length(xtest)]   <- sqrt((fcast$mean-xtest)^2)
  rmse2[i,1:length(xtest)] <- sqrt((fcast2$mean-xtest)^2)
}

plot(1:4,colMeans(rmse1,na.rm=TRUE), type="l",col=2, xlab="horizon", ylab="RMSE")
lines(1:4, colMeans(rmse2,na.rm=TRUE), type="l",col=3)
legend("topleft",legend=c("fit1","fit2"),col=2:4,lty=1)

The auto.arima function suggests the ARIMA(0,1,0) model, but the acf and pacf plots suggest a simpler ARIMA(0,1,9) model. Upon comparison, the AIC, BIC values and the RMSE values of the ARIMA(0,1,0) model are much lower than the other mode, indicating that it is the better choice.

We can then proceed to choose the best GARCH model using ARIMA(0,1,0) as the base model.

Squared Residuals

  • Plot
  • ACF Plot
  • PACF Plot
Code
fit <- lm(GE.Adjusted ~ inflation+unemployment, data=normalized_numeric_df)
fit.res<-ts(residuals(fit),star=decimal_date(as.Date("2010-01-01",format = "%Y-%m-%d")),frequency = 4)
fit <- Arima(fit.res,order=c(0,1,0))
res=fit$res
plot(res^2,main='Squared Residuals')

Code
acf(res^2,24, main = "ACF Residuals Square")

Code
pacf(res^2,24, main = "PACF Residuals Square")

From the squared residuals of the best ARIMA model, it can be observed that the ACF plot and PACF plot indicate that the residuals are not autocorrelated and are white noise, indicating a good fit of the model. Based on the squared residuals of the best ARIMA model, we can see that the ACF and PACF plots indicate that most of the values lie between the blue lines. Additionally, the p-value is 2 and q-value is 2. This suggests that the model has a good fit and that there is no significant autocorrelation or partial autocorrelation in the residuals. Now we can proceed by fitting GARCH Model for p and q values.

GARCH Model

  • Model
  • GRACH(1,1)
  • GRACH(2,1)
  • GRACH(1,2)
Code
model <- list() ## set counter
cc <- 1
for (p in 1:2) {
  for (q in 1:2) {
  
model[[cc]] <- garch(res,order=c(q,p),trace=F)
cc <- cc + 1
}
} 

## get AIC values for model evaluation
GARCH_AIC <- sapply(model, AIC) ## model with lowest AIC is the best
which(GARCH_AIC == min(GARCH_AIC))
[1] 1
Code
model[[which(GARCH_AIC == min(GARCH_AIC))]]

Call:
garch(x = res, order = c(q, p), trace = F)

Coefficient(s):
       a0         a1         b1  
1.080e-01  4.104e-01  1.072e-14  
Code
summary(garchFit(~garch(1,1),res, trace=F))

Title:
 GARCH Modelling 

Call:
 garchFit(formula = ~garch(1, 1), data = res, trace = F) 

Mean and Variance Equation:
 data ~ garch(1, 1)
<environment: 0x148b87ee8>
 [data = res]

Conditional Distribution:
 norm 

Coefficient(s):
       mu      omega     alpha1      beta1  
0.0026982  0.0219391  0.1528910  0.7287887  

Std. Errors:
 based on Hessian 

Error Analysis:
        Estimate  Std. Error  t value Pr(>|t|)  
mu      0.002698    0.056443    0.048   0.9619  
omega   0.021939    0.033878    0.648   0.5173  
alpha1  0.152891    0.161894    0.944   0.3450  
beta1   0.728789    0.314483    2.317   0.0205 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log Likelihood:
 -26.01023    normalized:  -0.5001967 

Description:
 Sat Jan  6 19:55:44 2024 by user:  


Standardised Residuals Tests:
                                Statistic p-Value  
 Jarque-Bera Test   R    Chi^2  0.4666824 0.7918834
 Shapiro-Wilk Test  R    W      0.9819011 0.6096038
 Ljung-Box Test     R    Q(10)  12.73464  0.2388891
 Ljung-Box Test     R    Q(15)  15.02448  0.4496551
 Ljung-Box Test     R    Q(20)  18.2661   0.5698836
 Ljung-Box Test     R^2  Q(10)  15.00172  0.1319993
 Ljung-Box Test     R^2  Q(15)  18.03358  0.2609042
 Ljung-Box Test     R^2  Q(20)  23.85804  0.2486481
 LM Arch Test       R    TR^2   10.46567  0.5751782

Information Criterion Statistics:
     AIC      BIC      SIC     HQIC 
1.154240 1.304335 1.143494 1.211783 
Code
summary(garchFit(~garch(2,1),res, trace=F))

Title:
 GARCH Modelling 

Call:
 garchFit(formula = ~garch(2, 1), data = res, trace = F) 

Mean and Variance Equation:
 data ~ garch(2, 1)
<environment: 0x13fb35128>
 [data = res]

Conditional Distribution:
 norm 

Coefficient(s):
        mu       omega      alpha1      alpha2       beta1  
0.00269818  0.02535776  0.16597577  0.00000001  0.69408610  

Std. Errors:
 based on Hessian 

Error Analysis:
        Estimate  Std. Error  t value Pr(>|t|)  
mu     2.698e-03   5.762e-02    0.047   0.9627  
omega  2.536e-02   4.246e-02    0.597   0.5503  
alpha1 1.660e-01   2.491e-01    0.666   0.5052  
alpha2 1.000e-08   2.846e-01    0.000   1.0000  
beta1  6.941e-01   3.998e-01    1.736   0.0825 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log Likelihood:
 -26.07056    normalized:  -0.5013568 

Description:
 Sat Jan  6 19:55:44 2024 by user:  


Standardised Residuals Tests:
                                Statistic p-Value  
 Jarque-Bera Test   R    Chi^2  0.4948671 0.7808021
 Shapiro-Wilk Test  R    W      0.9819355 0.6111646
 Ljung-Box Test     R    Q(10)  13.05933  0.2203734
 Ljung-Box Test     R    Q(15)  15.32623  0.4281834
 Ljung-Box Test     R    Q(20)  18.53299  0.5523415
 Ljung-Box Test     R^2  Q(10)  15.58891  0.1120207
 Ljung-Box Test     R^2  Q(15)  18.56009  0.2343648
 Ljung-Box Test     R^2  Q(20)  24.51462  0.220635 
 LM Arch Test       R    TR^2   10.70846  0.5540547

Information Criterion Statistics:
     AIC      BIC      SIC     HQIC 
1.195021 1.382641 1.178604 1.266950 
Code
summary(garchFit(~garch(1,2),res, trace=F))

Title:
 GARCH Modelling 

Call:
 garchFit(formula = ~garch(1, 2), data = res, trace = F) 

Mean and Variance Equation:
 data ~ garch(1, 2)
<environment: 0x11d50c518>
 [data = res]

Conditional Distribution:
 norm 

Coefficient(s):
        mu       omega      alpha1       beta1       beta2  
0.00269818  0.04213930  0.30074757  0.00000001  0.46607609  

Std. Errors:
 based on Hessian 

Error Analysis:
        Estimate  Std. Error  t value Pr(>|t|)    
mu     2.698e-03   3.484e-02    0.077 0.938266    
omega  4.214e-02   5.345e-02    0.788 0.430434    
alpha1 3.007e-01   8.078e-02    3.723 0.000197 ***
beta1  1.000e-08         NaN      NaN      NaN    
beta2  4.661e-01   3.142e-01    1.484 0.137934    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log Likelihood:
 -25.62945    normalized:  -0.4928741 

Description:
 Sat Jan  6 19:55:44 2024 by user:  


Standardised Residuals Tests:
                                Statistic p-Value  
 Jarque-Bera Test   R    Chi^2  0.6040771 0.7393096
 Shapiro-Wilk Test  R    W      0.984106  0.7108861
 Ljung-Box Test     R    Q(10)  13.05999  0.2203371
 Ljung-Box Test     R    Q(15)  15.24153  0.4341636
 Ljung-Box Test     R    Q(20)  18.46496  0.5568079
 Ljung-Box Test     R^2  Q(10)  14.60171  0.1472715
 Ljung-Box Test     R^2  Q(15)  17.70743  0.2783592
 Ljung-Box Test     R^2  Q(20)  22.47545  0.3152782
 LM Arch Test       R    TR^2   10.36302  0.5841457

Information Criterion Statistics:
     AIC      BIC      SIC     HQIC 
1.178056 1.365675 1.161639 1.249985 

Based on the analysis of the different GARCH models, it appears that GARCH(1,1) is the optimal choice. Although the AIC values of the different models are relatively similar, we can further evaluate their significance to make a final determination. Upon closer inspection, it appears that GARCH(1,1) has significantly better values than the other models, indicating that it is the most appropriate choice. Therefore, we can conclude that the GARCH(1,1) model is the best fit for the data.

Best Model

  • ARIMA Model
  • GARCH Model
  • Volatility
Code
#fiting an ARIMA model to the Inflation variable
inflation_fit<-auto.arima(normalized_numeric_df$inflation) 
finflation<-forecast(inflation_fit)

#fitting an ARIMA model to the Unemployment variable
unemployment_fit<-auto.arima(normalized_numeric_df$unemployment) 
funemployment<-forecast(unemployment_fit)

# best model fit for forcasting
xreg <- cbind(Inflation = normalized_data_ts[, "inflation"],
              Unemployment = normalized_data_ts[, "unemployment"])



summary(arima.fit<-Arima(normalized_data_ts[, "GE.Adjusted"],order=c(0,1,0),xreg=xreg),include.drift = TRUE)
Series: normalized_data_ts[, "GE.Adjusted"] 
Regression with ARIMA(0,1,0) errors 

Coefficients:
      Inflation  Unemployment
         0.0262       -0.1293
s.e.     0.1538        0.0698

sigma^2 = 0.1252:  log likelihood = -18.35
AIC=42.71   AICc=43.22   BIC=48.5

Training set error measures:
                      ME     RMSE       MAE       MPE     MAPE      MASE
Training set -0.01726622 0.343433 0.2723993 -30.29885 65.14651 0.4320897
                   ACF1
Training set 0.09578302
Code
summary(final.fit <- garchFit(~garch(1,1), res,trace = F))

Title:
 GARCH Modelling 

Call:
 garchFit(formula = ~garch(1, 1), data = res, trace = F) 

Mean and Variance Equation:
 data ~ garch(1, 1)
<environment: 0x13b3483b0>
 [data = res]

Conditional Distribution:
 norm 

Coefficient(s):
       mu      omega     alpha1      beta1  
0.0026982  0.0219391  0.1528910  0.7287887  

Std. Errors:
 based on Hessian 

Error Analysis:
        Estimate  Std. Error  t value Pr(>|t|)  
mu      0.002698    0.056443    0.048   0.9619  
omega   0.021939    0.033878    0.648   0.5173  
alpha1  0.152891    0.161894    0.944   0.3450  
beta1   0.728789    0.314483    2.317   0.0205 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log Likelihood:
 -26.01023    normalized:  -0.5001967 

Description:
 Sat Jan  6 19:55:44 2024 by user:  


Standardised Residuals Tests:
                                Statistic p-Value  
 Jarque-Bera Test   R    Chi^2  0.4666824 0.7918834
 Shapiro-Wilk Test  R    W      0.9819011 0.6096038
 Ljung-Box Test     R    Q(10)  12.73464  0.2388891
 Ljung-Box Test     R    Q(15)  15.02448  0.4496551
 Ljung-Box Test     R    Q(20)  18.2661   0.5698836
 Ljung-Box Test     R^2  Q(10)  15.00172  0.1319993
 Ljung-Box Test     R^2  Q(15)  18.03358  0.2609042
 Ljung-Box Test     R^2  Q(20)  23.85804  0.2486481
 LM Arch Test       R    TR^2   10.46567  0.5751782

Information Criterion Statistics:
     AIC      BIC      SIC     HQIC 
1.154240 1.304335 1.143494 1.211783 
Code
ht <- final.fit@h.t #a numeric vector with the conditional variances (h.t = sigma.t^delta)

#############################
data=data.frame(final)
data$Date<-as.Date(data$Date,"%Y-%m-%d")


data2= data.frame(ht,data$Date)
ggplot(data2, aes(y = ht, x = data.Date)) + geom_line(col = '#FDD835') + ylab('Conditional Variance') + xlab('Date')

From the ARIMA(0,1,0), we see that the training set error measures also suggest a good fit, with low mean absolute error, root mean squared error, and autocorrelation of the residuals. GATCH(1,1) model model is used to estimate the volatility of the standardized residuals of the previous regression model. The model includes a mean equation that estimates the mean of the residuals and a variance equation that models the conditional variance of the residuals. The coefficients of the mean equation suggest that the mean of the residuals is close to zero. The variance equation coefficients suggest that the conditional variance of the residuals is dependent on the past conditional variances and the past squared standardized residuals. The model’s AIC, BIC, SIC, and HQIC values are all relatively low, indicating a good fit of the model. The standardized residuals tests indicate that the residuals are approximately normally distributed and that there is no significant autocorrelation in the residuals.

The volatility of the model seems high in 2020 but has decreased gradually in the past few months. This could indicate that the asset’s price was experiencing a lot of fluctuations in 2020, but the market has stabilized recently.

Model Diagnostics

  • Residuals
  • QQ Plot
  • Box Test
Code
fit2<-garch(res,order=c(1,1),trace=F)
checkresiduals(fit2) 

Code
qqnorm(fit2$residuals, pch = 1)
qqline(fit2$residuals, col = "blue", lwd = 2)

Code
Box.test (fit2$residuals, type = "Ljung")

    Box-Ljung test

data:  fit2$residuals
X-squared = 0.27111, df = 1, p-value = 0.6026

The ACF plot of the residuals shows all the values between the blue lines, which indicates that the residuals are not significantly autocorrelated. The range of values for the residual plot between -2 and 2 is considered acceptable. Additionally, the QQ plot of the residuals shows a linear plot on the line, which is another good indication that the residuals are normally distributed. The QQ plot is a valuable tool to assess if the residuals follow a normal distribution, and in this case, the plot suggests that the residuals do indeed follow a normal distribution.

The Box-Ljung test, a p-value of 0.6026 indicates that the model’s residuals are not significantly autocorrelated, meaning that the model has captured most of the information in the data. This result is good because it suggests that the model is a good fit for the data and has accounted for most of the underlying patterns in the data. Therefore, we can rely on the model’s predictions and use them to make informed decisions.

Forecast

Code
predict(final.fit, n.ahead = 5, plot=TRUE)

  meanForecast meanError standardDeviation lowerInterval upperInterval
1  0.002698178 0.4377440         0.4377440    -0.8552644     0.8606607
2  0.002698178 0.4369055         0.4369055    -0.8536209     0.8590172
3  0.002698178 0.4361648         0.4361648    -0.8521692     0.8575655
4  0.002698178 0.4355108         0.4355108    -0.8508872     0.8562836
5  0.002698178 0.4349332         0.4349332    -0.8497553     0.8551517

The forecasted plot is based on the best model ARIMAX(0,1,0)+GARCH(1,1). This model takes into account the autoregressive and moving average components of the data, as well as the impact of exogenous variables on the time series. Additionally, the GARCH component of the model accounts for the volatility clustering in the data. Overall, this model is well-suited to make accurate predictions about future values of the time series.

Equation of the Model

The equation of the ARIMAX(0,1,0) model is:

\(Y(t) = Y(t-1) + \epsilon(t)\)

where \(Y(t)\) is the time series variable and \(\epsilon(t)\) is the error term.

The equation of the GARCH(1,1) model is:

\(\sigma^2(t) = \alpha_0+\alpha_1\epsilon_t^2(t-1)+ \beta_1\sigma^2(t-1)\)

where \(\sigma^2_t\) is the conditional variance at time \(t\), \(\alpha_0\) is a constant, \(\alpha_1\) and \(\beta_1\) are the parameters, and \(\epsilon_t\) is the error term.

The combined equation of the ARIMAX(1,1,1)+GARCH(1,1) model is:

\(Y(t) = Y(t-1) + \epsilon(t)\)

\(\epsilon(t) = \sigma(t) * \epsilon~(t)\)

\(\sigma^2(t) = \alpha_0+\alpha_1\epsilon_t^2(t-1)+ \beta_1\sigma^2(t-1)\)